Sequential Monte Carlo Methods for Option Pricing 

BY AJAY JASRA and PIERRE DEL MORAL 

Department of Mathematics, Imperial College London, SW7 2AZ, London, UK, 

o: 

CN . Ajay.Jasra@ic.ac.uk 

X- 

c^ . Centre INRIA Bordeaux et Sud- Quest & Institut de Mathematiques de Bordeaux, 



(N 

o 
u 

-(— > 

C/3 



Universite de Bordeaux I, 33405, France,, 
Pierre . Del-Moral@inria . f r 



Abstract 



Q"^ , In the following paper we provide a review and development of sequential Monte 

l> 

'sj" ' Carlo (SMC) methods for option pricing. SMC are a class of Monte Carlo-based 

in 

f""^ ■ algorithms, that are designed to approximate expectations w.r.t a sequence of related 

o: 

probability measures. These approaches have been used, successfully, for a wide 
class of applications in engineering, statistics, physics and operations research. SMC 

X" 

$H ' methods are highly suited to many option pricing problems and sensitivity/Greek 

calculations due to the nature of the sequential simulation. However, it is seldom the 
case that such ideas are explicitly used in the option pricing literature. This article 
provides an up-to date review of SMC methods, which are appropriate for option 
pricing. In addition, it is illustrated how a number of existing approaches for option 
pricing can be enhanced via SMC. Specifically, when pricing the arithmetic Asian 
option w.r.t a complex stochastic volatility model, it is shown that SMC methods 
provide additional strategies to improve estimation. 
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1 Introduction 

Monte Carlo methods have been extensively used in option pricing since the paper of 
[8]. Subsequently, there have been a wide variety of Monte Carlo approaches applied: 
quasi Monte Carlo (QMC), stratification, importance sampling (IS), control and antithetic 
variates, etc; see [33] for a thorough introduction. 

The importance of Monte Carlo for option pricing, against other numerical approaches, 
is the ability to deal with high-dimensional integrals. This is either in the time parameter 
of the derivative (path dependent options), or in the dimension of the underlying. The 
rate of convergence is 0(l/ViV), A^ being the number of simulated samples, supposedly, 
independent of the dimension. In addition, the methods are straight-forwardly extended 
to: 

• complex stochastic volatility (SV) models (e.g. [1]) 

• complicated financial derivatives (e.g. [9]). 

SV models are particularly useful to realistically replicate price dynamics. The latter point 
is relevant due to an increase in the volume traded of these instruments. Monte Carlo may 
also be used to calculate sensitivities/Greeks (e.g. [31]). As frequently noted in the option 
pricing literature, standard Monte Carlo estimates can suffer from a high level of variability, 
but can be improved using some of the methods mentioned in the above paragraph. 

We have thus stated that Monte Carlo methods are an important tool for option pricing, 
but can suffer from high variability. An often used technique to deal with this problem is 
IS (e.g. [31]). As is well known, the idea is to change the dominating measure such that 
the resulting Monte Carlo estimation benefits from a lower variance. In many financial 
applications the simulation is sequential, that is, the underlying is sampled at discrete time- 
points. This often yields Radon-Nikodym derivatives that can be re-calculated sequentially 
in time. It is also well-known (e.g. [271 1331 EI]) that the variance of these weights increases 
with time; hence IS needs to be improved in order to yield accurate estimates of the option- 



price. 

This above problem can typically be solved using sequential Monte Carlo methods. This 
is a class of IS methods that are extensively used in engineering, statistics and physics. 
However, to our knowledge, these ideas are seldom used in the option pricing literature 
(see [171 Eni HH [581 EE] for the few applications we were able to find). The purpose of this 
article is thus two-fold: 

1. To provide an up-to date literature review of SMC methods, particularly focussed on 
option pricing and sensitivity analysis. 

2. To illustrate that such methods can help to push the boundaries of the models for 
which option prices can be calculated, accurately. 

In terms of [H it seems that such methods are not well understood in the financial engineering 
literature, or at least the benefits of their application in option pricing is not appreciated. 
Therefore, we aim to review such methods and illustrate their use as well as improvement 
over standard approaches. Indeed, in some cases it is even possible to calculate option 
prices that are not subject to time discretization error [29]; this requires SMC methods. In 
relation to [21 an SMC algorithm is introduced to compute the value of arithmetic Asians, 
when the underlying is modelled by a stochastic volatility model. The volatility follows a 
non-Gaussian Ornstein-Uhlenbeck process [1]. This problem cannot be easily solved using 
ordinary IS and deterministic methods; see Section[4|for further details. Note, that it should 
not be seen that SMC methods are competitors to existing methods in option pricing, but 
simply that they enrich the methodology that can be used: in many cases SMC can be 
combined with existing ideas such as stratification (see e.g. [2B] for some recent work). 

It is remarked that the application of SMC methods can substantially reduce the vari- 
ance of ordinary Monte Carlo and IS estimators; this at the cost of an increase the com- 
putational cost. In other words, the methods can be the most accurate in comparison to 
other approaches. Due to the above statements, the methods reviewed here in most cases 
could not be used at high frequency, but if solutions are required in minutes can be actively 



used in finance (although see [50] for computational hardware that may make the methods 
even more applicable). Note, also the ideas differ from parametric IS, where the proposal 
lies in a parametric class, and is found to minimize some criterion (e.g. as in the cross 
entropy method see [03] for details). Since these methods are not SMC techniques, we do 
not review them here. 

This article is structured as follows. In Section [2] we discuss an example to motivate the 
application of SMC methods for option pricing. In Section [3l SMC methods are detailed 
along with some of the latest developments. Illustrations are given on various examples to 
show that SMC methods can enhance existing Monte Carlo approaches in option pricing. 
In Section H] an original SMC approach is designed for pricing Asian options, using the 
Barndorff-Nielsen & Shephard (BNS) SV model [1]. In Section |5] the article is concluded 
and some avenues for future research are discussed. There is an appendix which gives the 
proof of a result (in Section |3]) and some details from the example in Section HI 

2 Motivating Example 

2.1 Some Conventions 

Some conventions are given. Recall that a Monte Carlo procedure simulates N independent 
samples, X^^\ from a density, it and estimates vr— integrable functions h via 

Importance sampling follows much the same formula, except sampling from an g ^ tt and 
using the estimate 

N 

^«;«/i(X«) 
with w^^^ = d7i/dq{X^^^). The following notation is used. For any {i,j) E Z+, i < j, 



X. 



i:j •- 



{xi, . . . ,Xj). A process is written {Xt}te[o,T]- A measurable space is denoted {E,S). 



Given a sequence of spaces Eq, . . . ,En (resp. a— algebras So, ... , £n) the product space is 



written as E[o,n] (resp. product a— algebra £^[o,n])- For a probability vr and vr— integrable 
function h, the notation n{h) := J h{x)'7i{x)dx is sometimes used. The dirac measure 
on {x} is written 6x{dx'). Probability densities are often assigned a standard notation p. 
Expectations are written generically as E and a subscript is added, if it is required to denote 
dependence upon a measure/point. Also, for p G Z+, Tp = {1, . . . ,p}. 

2.2 Price Process 

Throughout, the price process {St}te[o,T] St G M*^ follows a general jump-diffusion of the 
form 

dSt = fi{St)dt + Vta{St)dZt 

where {Zt}te[o,T] is a Levy process, and the volatility {Vt}te[o,T], may be deterministic, or 
may follow 

dVt = a{Vt)dt + P{Vt)dUt 

where {Ut}te[o,T] is a Levy process, that may be correlated with {Zt}te[o,T]- This permits 
a wide-class of stochastic volatility models, which can accurately replicate the stylized 
features of returns data (e.g. [1]). All expectations are taken w.r.t an equivalent, structure 
preserving. Martingale measure, Q; this will exist for our examples. 

2.3 Barrier Options 

The first example is the calculation of barrier options. These are derivatives for which the 
payoff may be zero, dependent upon the path of the underlying {St}teh I C [0,T], hitting 
a barrier. Our examples will concentrate upon European style options 

with 

mSt}tei) = lA{{St}t^i)e-''^{ST - K)+ 
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a barrier call option, with strike K > 0, interest rate r > and A the barrier set. It is 
assumed that the initial value of the underlying lies inside (resp. outside) for knock-out 
(resp. knock-in) options. For example, for a discretely monitored knock-out barrier option, 
I = {ti, . . . ,tm : < ti < ■ ■ ■ < tm = T}, to = 0, d = 1, barrier set A = ^'^A'^u^K]^ the 
arbitrage price is: 

where ft- summarizes all the random variables induced by the volatility process (e.g. includ- 
ing integrated volatilities) and it is assumed that the possibly unknown transition densities 
can be written w.r.t a dominating measure d{st^;t^^v^;tm)- There are a large number of 
publications associated to the Monte Carlo calculation of barrier options, including: P|I53]: 
Zhao et al. (2006) and for the calculation of the Greeks [6l [31] . 

2.3.1 A Numerical Example 

On inspection of ([T]), it is clear that the estimation of the barrier option is not a simple 
task in Monte Carlo integration. For example, even if we are able to assume that 

• the transition densities are known 

• it is possible to simulate from the process 

or if an Euler discretization is adopted, it is still the case that many paths may yield a zero 
Monte Carlo estimate: they may knock-out before the terminal time. A simple remedy to 
this problem can be found in [35]. When the volatility is deterministic, the authors sample 
from the one-step process, conditioned so that it does not knock-out. This corresponds to 
an IS procedure with weights: 



w = 

i=l 



Y[\ / P{su\st,_„vt^_^)dst^. 



The weights need not be known, but an unbiased estimate of them is required; see Section 
13.4. H but the point is established in [35]. Note, as stated in [35], this idea only reduces the 



variance of ordinary Monte Carlo in terms of knocking out; it may be sensible to design an 
IS strategy that takes into account the nature of the terminal cost. 

The above idea, whilst very effective, is not always useful for large m - the variance of 
the weights will increase with time. This is illustrated in Table [H where a Black-Scholes 
model is adopted, with d = 1, r = 0.01, a = 0.75, K = Sq = 10.0, tj — ti_i = 0.5 and 
a time-homogenous barrier A = [5.0, oo)™; a down-and-out option. Monte Carlo methods 
are not needed here, but the idea is to show how SMC methods help; a more convincing 
example can be found in Section |H The variance of the weights are approximated using 
the effective sample size (ESS, see [51]): 



ESS- ^^'=' ' 



Eti(-«)" 



If the simulated samples are dependent, it measures the number of samples that are in- 
dependent. Here, it gives us a measure of the variability of the weights: the closer to A^ 
(= 30000 here), the more the number of 'useful' samples. In Table [H it is seen that the 
variance increases with m; this is well-known (e.g. the Theorem in [IZ]). In some cases, 
the algorithm can stabilize; the ideas in the subsequent Sections are still relevant, but are 
potentially less useful there. 

It could be argued that any number of Monte Carlo enhancement methods could be used 
to deal with weight degeneracy; however, none of these approaches are explicitly designed 
to deal with this. However, SMC methods can deal with this problem; see Section 13.3. Ij 
for some associated drawbacks. As noted in Chapter 7 of [33], the well-known likelihood 
ratio method [TT| [32] for greek calculation, suffers from exactly this problem. As a result, 
SMC methods will be useful for a wide variety of option pricing and hedging problems. 



m 


5 


10 


15 


20 


25 


ESS 


21826.90 


13389.60 


8710.91 


5909.51 


4139.27 



Table 1: Effective Sample Size for an IS Estimate of a down-and-out Barrier Option. We 
used a Black-Scholes model with r = 0.01, a = 0.75, a = 5.0, b = oo. 30000 samples were 
simulated. 

3 Sequential Monte Carlo Methods 



3.1 Sequential Importance Sampling 

We now introduce a general methodology of sequential importance sampling (SIS), in a 
particular context. Sequential Monte Carlo techniques can be traced back to at least the 
1950's in [lO] and [60]; see [51] for a historical review. These ideas have been developed 
within statistics (e.g. [27]), physics (e.g. [S]), engineering (e.g. [38]) and operations research 
(e.g. [37]). There is no claim that the reference list is exhaustive, due to the volume 
of publications in this field. Much of this review focuses upon the statistical literature 
(with numerous exceptions), since it appears such ideas are not often used in the financial 
engineering literature. 

The basic idea is as in ordinary IS, the only real difference being that calculations 
are performed sequentially. This sequential formulation allows us both to theoretically 
understand the algorithms as well as to derive more advanced versions. The simulation 
begins by simulating a collection of samples in parallel and the importance weights are 
computed in a sequential manner. These samples, in Section 13.21 will interact. In the 
statistics and engineering literature, it is typical to call the samples particles and this 
terminology is used interchangeably throughout the paper. 



3.1.1 Formulation 

Let {{En, Sn)}o<n<m be a sequence of measurable spaces. It is assumed that it is of interest 
to simulate from and compute expectations w.r.t a sequence of related probability measures 



{'^n}o<n<p on measurable spaces (£'[o,n]7^[o,n]), that is, a sequence of spaces of increasing 
dimension. Throughout the paper, {7r„}o<n<p will often be referred to as 'targets'. Note, 
in some scenarios these targets can be artificial/arbitrary and provide the potential user 
with an extra degree of freedom to design the algorithm; this is illustrated in Section 13.1.21 
Introduce a sequence of probability measures of the following standard form, for n > 0: 

with Mn : -E[o,n] — ^ ^^ a probability kernel which we are able to simulate from, Q_i(-) := 1 
and Mq = rjQ : Eq ^ M^ a probability density on Eq. 

The simulation idea, in Figured! is then essentially associated to the simple importance 
sampling identity: 

L hn{xQ;n){ nr=0 Wi{XO.i)]Qn{Xo;n)dXo;n 



(2) E^J/i„(Xo.„)] 



L[„,„, { ]Ti=o'^i{XO:i)]Qn{Xo:n)dXo.,n 



(3) W.{xo..) - ""^^"^^ 



Tri_i{xo:i-i)Mi{Xi\Xo;i-l) 

(7r_i(-) := 1) in the above equation ([2]) there is division by 

~ n 

/ {Y\^i{XO:i)}Qn{XO:n)dXo:n 

to ensure that the incremental weights ([3]) need only be known point-wise up to a normal- 
izing constant. The following biased (for finite A^, but provably convergent as A^ — ?■ cxd) 
estimate is employed 

N 



<{K) = Y.^n{x^-n>n 



1=1 



(4) «;(') - — nr=o^»(4!: 



E;=i{nr=oW^^(<i)} 

where Xq'^^ is the /*" sample at time n. From herein it is assumed, unless otherwise written, 
that the incremental weights are the un-normalized versions. The normalizing constant is 
(abusively) defined as 



» n 

^ri'= I {WWi{Xo.,i)]Qn{XQ;n)dXo.,n. 

•J E\n „l ■_n 



^[o,7i] i=o 
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• 0. Set n = 0; for each i G Tn sample Xq ~ r/o and compute Wo{xq ). 

• 1. Set n = n + 1, ii n = m + 1 stop, else; for each i E T^ sample Xn {x^.^n-i ~ 
M„(-|a;o*„_i), compute Wn{xQ^n) ^^^ return to the start of 1. 

Figure 1: A Generic SIS Algorithm. 
In order to select the M„, a conditionally optimal density is [26] : 

(5) Mn{Xn\Xo:n~l) = 7r„(x„,|Xo:„_l). 

The proposal is optimal, in terms of minimizing the variance of the incremental weights, 
conditional upon XQ-n-i (see [25] for some limitations). Algorithms which can be expected 
to work well, will attempt to approximate this density; see [26] and the references there-in 
for details. 

There are a large number of extensions of the SIS method. We list some references here: 
[m EH ESI [55]. In the paper [21], the algorithms are combined with Markov chain Monte 
Carlo (MCMC) methods; this is discussed later in Section 13.2.11 

3.1.2 The Barrier Option Revisited 



In the case of the barrier option problem, in Section 12.31 if it is not possible to simulate 
from the process, but the transition densities are known, then we can introduce a process 
via Q. The formula i^ can be approximated using SIS, with the incremental weight 

Mi = q. Any discretely sampled option pricing problem can be written in this form. Indeed, 
in more generality, we may try to incorporate a terminal reward into the target densities, 
{vr^}. For example, the optimal importance density is 

^ i=l ^ 



11 

Then, it is sensible to look for optimal sequences of proposals that approximate this (as in 
[53]). In theory, the sequence of densities 

is a 'sensible' path to the optimal IS density; however, they cannot be computed. One 
strategy to circumvent this, in |l2], is to introduce a monotonic transformation of the 
potential {S — K) + , say g, at some time, close to the terminal time in the target density 

<°'^(^0,Sii:i„,t;ti:tJ OC^((^i„ -K) + )l Y[^[at,MA^u)p{Su,Vt^\St^_^Vt,_J >p{Vo). 

^ i=l ^ 

Note, in many cases vr^°'^(-) < C„7r^"'^(-) for C„ G (0,+oo), C„ > C„+i, therefore it is 
sensible to introduce the potential function at some time different than 1. In this case, the 
estimate 



(6) Z^ Yl 






t.« 



can be used. It is described how to approximate the normalizing constant Z^, below. 

3.2 SIR 

As we saw in Section 12.3. Ij the SIS method will not always work as the time parameter 
increases. Before continuing, there are related methods in rare event simulation, termed 
multi- level splitting (e.g. [IH])- These techniques are related to SMC as they are approxi- 
mations of multi-level Feynman-Kac formulae [HI [12]; SMC algorithms are approximations 
of 'standard' Feynman-Kac formulae (as in [IE])- Indeed, SMC algorithms related to split- 
ting are given in [T3]. Since most option pricing problems are not of rare-event form, we 
do not discuss the ideas of splitting any further; see [IH] and the references therein for an 
introduction. 
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The formulation is as in Section I3.1.1J to simulate from a sequence of related probability 
measures on state-spaces of increasing dimension. The following resampling scheme is 
inserted into the SIS algorithm: At time n of the algorithm A^ particles are sampled, with 
replacement, from the current set of particles according to some stochastic rule with the 
property 

(7) E[Ar;|x(!L] = iV^« 

where iV^ is the number of replicates of the i^^ particle at time n. The most basic way 
this can be achieved is by resampling the particles according to the normalized weights 
{wn }i<i<N- There are a variety of superior methods for resampling the particles: residual, 
stratified, systematic etc; a full description can be found in |27]. The systematic method 
is used here. This sequential importance sampling with resampling, is termed sequential 
importance sampling/resampling (SIR) (see (JSIIHIEZ])- The simulated paths are no longer 
independent, but there is a well-established convergence theory; see [18]. In addition, there 
is a theoretical advantage to resampling the particles; the asymptotic variance in the central 
limit theorem [18] can be upper-bounded, uniformly in time, with resampling; this is not 
necessarily the case otherwise - see [T3] . 

The algorithm is given in Figure |2l The particles can be resampled at any time step 
of the algorithm, but it is best to do this when the weights are very variable; e.g. when 
the ESS drops below a pre-specified threshold. This is theoretically valid, as established in 
|20] . The reason for the above is that if the particles are resampled too often, then there 
are too many sampled paths which have been replicated - the paths degenerate; see Section 
l3.3.1l for further details. In addition, if the optimal proposal (|5]) can be used, then it is best 
to sample after the resampling operation has occurred. This will increase the number of 
unique samples and lower the variance in estimation. 

To estimate the normalizing constants, use 



z„ = n 



Tn-l -07 



^k 1 
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0. Set n = 0; for each i G Tn sample Xq ~ r/o and compute Wo{xq ). 

1. Decide whether or not to resample, and if this is performed, set all weights to 
Wn = l/N and proceed to step 2. If no resampling occurs, the weights are as in 
equation (jl]). 

2. Set n = n + l, iin = m + l stop, else; for each i E Tn sample Xn \xQ.n_i ~ 
Mn{-\xQ.li^_i), compute VF„(xo.„) and return to the start of 1. 

Figure 2: A Generic SIR Algorithm. 



with 



w #- = ^E 11 »'.!-■ 



TV kr 

'='■-1 i = \ j = kr-l + l 

where ko = 0, kj is the j"^ time index at which one resamples for j > 1. The number of 
resampling steps between and n — 1 is denoted r„_i and we set kr^ = n (to ensure that 
the final term includes Z„ in the numerator, as is required for correctness). 



3.2.1 SMC Samplers 

A useful algorithm related to SIR is termed SMC samplers [21]. This method is designed 
to sample from a sequence of target distributions on a common space; i.e. {7r„}o<„<p are 
probabilities on {E^ £). However, due to a difficulty described below, the algorithm samples 
from a sequence of distributions of increasing dimension. The marginal of each new density 
is the one of interest. 

Suppose we initialize an IS algorithm with A^ particles {Xq }i<j<Ar sampled according 
to some initial density, ?7o, and weight w.r.t ttq. Further, suppose that the IS works well; 
this can be achieved by making ttq very simple, e.g. r/o = ttq. Now, under the assumption 
that ttq and vti are not significantly different, it might be expected that we can construct a 
Markov kernel Ki{xq,xi) so as to move the particles from regions of high density of ttq to 
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the corresponding regions of tti. In such a case, the importance weight would be: 

Equation (Q presents some major difficulties: 

• In many cases the integral in the denominator i^ cannot be computed. 

• In some cases, Ki is not known point-wise. 

For the above reasons, it appears that such a weighting scheme seems destined to fail. For 
example, even when there is the computational power to approximate the integral in the 

denominator of (jH]): 

1 ^ 

the second point will prohibit its application. 

The solution is to modify the problem to a more familiar setting. Introduce a se- 
quence of auxiliary probability measures {7r„}o<n<p on state-spaces of increasing dimension 
{E[o^n],£[o,n]), such that they admit the {vr„}o<n<p as marginals. The following sequence of 
auxiliary densities is used (see [H] and [52]): 

n-l 

(10) 7r„(xo:n) = 7r„(x„) J]^ Lj{Xj+i, Xj) 

j=0 

where {Ln}o<n<p-i are a sequence of Markov kernels that act backward in time and are 
termed backward Markov kernels. The algorithm samples forward using kernels {Kn}- The 
choice of backward kernels is made as the incremental weights are 

(11) iyn(x.-l:.)= ^nM^n.ix X^.,) ^^^ 

'^n—ly-^n—lj-'^ny-^n—l: •^n) 

which allows for a fast computation and avoids a path degeneracy effect (see Section l3.3.ip . 
It is clear that ( ITOl) admit the {vr„} as marginals, and hence, if one sequentially samples 
from {tTjj} we are left with a problem that is the same as for SIR. That is, one uses the 
algorithm in Figure [2] with the kernel K^ as a proposal (instead of Mn) and incremental 
weight (TTTI) . A discussion of the optimal choice of backward kernels can be found in [2T] . 
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3.2.2 The Barrier Option Revisited 

Two examples of SIR algorithms are now presented. The first, is a straightforward modi- 
fication of the SIS method demonstrated in Section I2.3.1I In this case the target densities 
are 

T^nivO^Sty.tJ OC <^ Y[l[at^,btA^k)PiSu\sU-l) 
^ i=l 

with importance densities as in Section [2.3. 1[ As a result, this approach is as in [35], except 
with resampling. The second idea is as in [12]: include the function {S — K) in the target. 
This is to reduce the variance in estimation, when considering the call option. The function 
used is 

g{S-K) = \{S-K)r, 

K is referred to as a temperature parameter. See ^2] for a justification of this choice of g. 




(a) No Potential. (b) Potential. 

Figure 3: Controlling the Variance of the Weights. We used an SIR algorithm to apply 
the method of ^35j to a simple Black-Scholes model (left). In the right plot we used the 
approach of [42]. The settings are as in Section [2.3.11 and there are m = 25 time steps of 
the algorithm and 30000 particles are simulated. 



In Figure [3] (a) the performance of the first SIR method can be seen. In the Figure, we 
have used the Black-Scholes model with ?7i = 25 and A^ = 30000 particles; the particles are 
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resampled when the ESS drops below 15000 particles. The algorithm exhibits very stable 
variance properties, resampling only 2 times. As a result, and in comparison to the results 
of Table [H the algorithm is much more useful for the estimation of the down-and-out barrier 
option. Note that there is a marginal increase of the overall CPU time. The CPU time is 
less than 10 seconds on a Pentium 4 2.4 GhZ machine, coded in C++; all code is available 
upon request from the author. 

In Figure [3] (b) the performance of the second SIR algorithm can be seen. In this case 
the potential is introduced at time 10, with n = 0.08. At subsequent time steps, this value 
increases by 0.045. These values are set by some prior tuning, but it is possible to do this 
adaptively; see |13]. The issues here are: 



• When to introduce the potential 

• How fast the temperature should be increased. 

In general, the potential could be introduced quite early in the simulation. This would 
allow the samples to adapt to the potential, but at the cost of increasing the variability 
of the weights. In practice, we have used between one-third and two-thirds of the overall 
algorithm time-length. The second issue is also important: to reduce the variability of 
dnj, the temperature should be (at the final time-step) bigger than 1. However, if the 
temperature increases too rapidly, then algorithm has to resample too often. The stability 
of the weights is not as good as in Figure |3] (a). This is due to the introduction of the 
potential function. However, let us consider the actual quality in estimation of the down- 
and-out option, a significant improvement can be seen. Both approaches are run 25 times, 
with 30000 particles, resampling threshold of 15000, as well as computing the analytic 
approximation in [10]; the estimates of the options (±2 standard deviations across the 
repeats) are 76.81 ± 13.81 and 6.03 ± 0.43, with analytic approximated value of 6.16. This 
illustrates a well-known fact about IS: the samples have to be pushed towards regions of 
importance, in terms of the functional of interest. The monitoring of the variance of the 
weights may not be enough to yield sensible estimates for option pricing. 
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3.3 SMC Methods for Sensitivities 

We now review some SMC methods for estimating derivatives of expectations. This can 
be useful for calculating the Greeks, especially when the transition densities are not known 
and Euler approximations are adopted via the likelihood ratio method. There is a growing 
literature on Malliavin calculus approaches [6l [31], even in the case of quite sophisticated 
SV models [5]; these problems could be dealt with using the SMC approaches in the other 
Sections. Indeed the rates of convergence, in comparison to the likelihood ratio method, 
are faster [23], although in many cases the Malliavin weight is not known. 



3.3.1 Path Degeneracy 

In option pricing, most Greeks which are not calculated through Malliavin methods are of 
the form 



(12) A(<l>,sc 



/ "^(Sii:*™.)! Y[p{Su^'Vu\Su^,,Vt^_j\p{vo)d{St^.,t^,Vt,y,t^ 



Monte Carlo methods for computing this quantity can be found in Chapter 7 of [33], see 
also [2]. It is explicitly assumed that the transition density is known and upper bounded, 
and that the derivatives of this quantity can be calculated analytically. 

One of the problems of estimating ( iT2l) using IS methods, is that as seen in Section 
I3.2.2[ resampling is often required to control the variance of the weights. However, a key 
problem for estimating ( TT2l) is that the integral is on a path space. For example, approaches 
which calculate expectations (option prices) can use the same set of particles to estimate 
the sensitivities (Greeks) [13]. The difficulty is the following: the resampling mechanism, 
whilst necessary to control the variance of the weights, induces a path degeneracy effect. 
There may be many unique particles at the current time, but going backward in time yields 
paths which coalesce to the same parents. That is to say, that Xo^n-i are not re-simulated 
at time n. Hence, due to resampling, many of the xom—L+i (L > 0) will be the same across 
the simulated samples. In other words SMC algorithms are only useful for calculating 



expectations of the form: 



Je\ 



'''\^n~L-.n)'^n\^n~L-.n)(^'^n~L:n 



[n — L ,n\ 



for some fixed lag L > 0. See [T] and [TS] for more details. It should be noted that it will 
still be possible to compute the sensitivities of a wide-class of option prices. 



3.3.2 Marginal Approximations 

In this Section it is shown how to estimate the Greeks, when using the same set of par- 
ticles to estimate the option price. The method can only be used if there is a Markovian 
dependence of the {St^, Vt„) G En. For simplicity vq is dropped, but can straight-forwardly 
be re-introduced. 

The idea is associated to the likelihood ratio method; assume the pay-off function is of 
the form 

m 
i=l 

note that there are wide class of options that can be written this way. Assume that only 
the process depends upon a parameter 6' G O. Then the sensitivity is, assuming the validity 
of interchanging integration and differentiation (e.g. |18]) 

d_ 

We 






where 6 is the parameter of interest. To obtain the sensitivity of interest we present the 
following result, whose proof is housed in the appendix. 



Proposition 1. Let, forn > 2 



(13) 



A„(rfx„) 



An-l{dXn-l)pe{Xn\Xn-l)^{St^] 



d 



CIJLyi \ 



n„,_i((ix„_i)--{p0(x„|x„_i)}$(st„; 



HitXj Yi 



where 



Iin{dXn) 



Iin-l{dXn-l)pe{Xn\Xn-l)^{St„] 
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Xn = ist„,vtj and 



Ai{dxi) 
Ili{dxi) 



d 

-QQ{Pe{xi\xo)}^{st^, 



dxi 



Pe{xi\xQ)^{st^\ 



dxi. 



Then, A„(l) = Ag{<^,so). 



In the case of the Greeks A and F, the second expression on the RHS (fT3!) is not needed, 
due to the dependence of only the first term in the product nr=i Pdi^il^i-i) o^ -^o- 

The approach below is a special case of that detailed in [56] (see also [57]). The objective 
is to recursively approximate the measures {A„} and {n„}. The initialization can be 
achieved by IS. Then, given an empirical approximation at time n — 1 of A„_i, say: 



N 






j=l 



and of n„_i, with different weights Wn-i, then the following is a point- wise approximation 
of the signed density 



A^^(x„) = $(.,„) 



N 



N 



d 



J2W;^lMXn\xtl) + J2^:^-^7^{Pe{Xn\xtl)} 



i=l 



i=l 



.(i) 



i=l 



Assuming that new particles < X„ > are sampled from some density \E'^(x„), then using 
IS, the following approximation is obtained 

AT 

where 

(14) 
Note 



^^(i) ^ ^K \Xn ) 

" iVv|/^(a;«)' 



77^{i) _ 2^3 = 1 ^^n-lPd\Xn IJ-n-lJ 
" ~ iW(xF) 
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The computational complexity of computing the weights is 0{N'^). This is required, 
to avoid the path degeneracy problem. It should be noted that an asymptotically biased 
approach, which is of 0{N) could, in some cases, be used from [53], with the identity 



|/m..(...^/m.|.o,.,.„., 



x)dx. 



See [S3] for further details. We remark, also, that the work of [221 [23] can also be used, in 
some contexts, for greek calculation, but do not review these new techniques here. 

3.3.3 Barrier Option Revisited 

We return to the Example in Section 12.3.11 We calculate the A and the V using the 
recursion in flT3|) . The model settings were as in Section [2.3. II The algorithm is run with 
10000 particles, 25 times which took 30 hours; i.e. 1 hour 20 min to run the algorithm 
once. The A was estimated as 0.22 ± 1.14 and the V was —29.7 ± 138.34. Note that the 
error is ± 2 standard deviations for the 25 repeats. The ratio of the estimate to the ± 2 
standard deviations are approximately 1/5 and perhaps too high; reducing the variability 
is the subject of current research. The CPU time is not substantial and shows that the 
given recursions are potentially useful for at computing A and F, but other path-based 
sensitivities may require biased estimates as in 



3.4 SMC Methods for Continuous-Time Processes 

In the context of continuous-time processes, there are discrete-time SMC algorithms which 
can approximate the continuous-time Feynman-Kac formulae with no approximation-bias 
(see also [6Tj). 

3.4.1 The Random Weight 

An important comment, as realized by [02], on IS is the following. It is not necessary to 
know the weight exactly; only to find an unbiased estimate of it. More exactly, let n = 
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for simplicity. Suppose it is possible to find some function dn '■ En ^ F^ ^^ M^ and a 
probability pn so that: 

then 

T^nihn) = — hn{x)Wn{x)rin{x)dx 

^n J En 

= -^ I fn{x){ I ^n{x,u)pn{x,u)du[r]n{x)dx. 
^n Je„ L Jf„ ) 

As a result, we may perform joint IS by sampling from p„ x ?7„. This comment is extremely 
useful for option pricing; it has already adopted, by at least [35]. Moreover, it can be useful 
for calculating expectations w.r.t probability laws of Levy processes. 

3.4.2 Barrier Option Revisited 

The ideas here are based upon the random weight, and the exact simulation of diffusions 
methodology [71 [29]. In many cases, the option price can be written as an expectation w.r.t 
the transition densities, but that the densities are not available up-to a constant or cannot 
be simulated. However, it may be that transition laws are known indirectly e.g. in terms 
of their characteristic function; see [5^] for some solutions. 

Consider ([1]) with deterministic volatility and the underlying is a diffusion satisfying 
the SDE 

dSt = ^{St)dt + dWt 

with ^ = VH and some additional assumptions in [29]. Then the transition is 

P(st,kt,-i) = 0(sf,;Si,_,,(ti-ti_i))exp{H(5iJ -S(5i^_, -/(ti-ti_i))} x 



(15) E^ 



exp { - j C{Ws)ds] 

J\0,T] 



'[0,T] 

where the expectation is taken w.r.t the law of a Brownian bridge starting at St-_-^, ending 
at Sti and the other parameters/functions are known exactly; see [29] for details. Therefore, 
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we can use an SIR algorithm with an incremental weight 

that is not known exactly. However, it is detailed in both [7] and [29] how the expectation in 
( TT5l) can be estimated unbiasedly; the random weight idea can be applied. In our experience 
the estimation of 



K. 






exp { 



[0,T] 



C{Ws)ds} 



through the methodology in [29], can lead to substantial increases in variance of the weights. 
In high-dimensional cases it may be difficult to use this idea, without some extra variance 
reduction methods. A similar idea can be found in [61] when approximating 

e! 



^(iyT)exp<j / U{Ws)ds 

[0,T] 



See [M] for a review of related issues. 



4 An SMC Method for Pricing Asian Options 

In the following Section we present a method to approximate the value of a fixed strike, 
arithmetic Asian option, when the underlying follows the system 

dYt = fidt + VtdWt 
dVt = -Vtdt + dUt 

where Yt is the log price, {Wf}tg[o,T] is a Brownian motion and {f/(}tg[o,T] is a pure-jumps 
Levy process, such that the marginal of VJ is a Gamma distribution; see [1] and the appendix 
for details. This latter model has been shown to fit, to an extent, the dynamics of real 
price data. 

This problem is particularly difficult for Monte Carlo methods, even in the Black- 
Scholes case (see, however, [13] for some work on the exact approximation with continuously 
monitored average). The reason is due to the path-based nature of the payoff, which can 
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contribute a substantial variance in the estimate. Indeed, given the methodology explained 
thus far, it is not straightforward to use SMC methods for exactly the previous reason; 
see Section 13.3.11 Some sensible solutions (to the form of the pay-off), in the Black- 
Scholes scenario, can be found in [M] and [IB]. Also, note that efficient techniques for 
partial differential equations (PDEs) can be found in [611 ES] . These methods appear to 
be adaptable to our case, but would lead to solving a partial integro-differential equation 
which is not always simple and accurate; see [16] for some ideas. 



4.1 Strategy 

The idea is to try, as in the PDE methods above, to reduce the dimension of the problem. 
In essence, we seek to sample from the law of the sum of the underlying at each monitoring 
period. The problem is that this is only known in terms of an intractable integral. However, 
it is shown that it is possible, using SMC methods to sample from the optimal importance 
density, such that it is a marginal on an extended state-space. 

In our case, ignoring the discount term, the value of the option is 



•^ \ 4 = 1 / + ^ j = l 



vi:u)pM \d{yt,.,t^,vt,.,t„ 



where Vt^ is known and suppressed from the notation and f^. is a 2-dimensional Poisson 
process on [0, Xuiti — tj_i)] x [0, 1]. Making the transform, st- = exp{|/i-}, and then 

Si- = Ui- — Vt ^ i > 2 
yields the option price 



m 



K\ \ n'^^'.-l{^*«'^'^^*-2:t,-l),5-i(fl:tJMftJ |rf(z/i,: 



tjYi "J Cl .tjf 



where v^vt _ {t-'u, /^(^t,-2:ti-i)5 ^{'^i-.u)} is the shifted log-normal density with location param- 
eter 

^i{vu.r.ti-i) = ^og{vu_^ - vu_^) + ^l{ti - ti_i) 
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scale parameter as in [1], i't_^ = 0, utg = Sq and see the Appendix for the definition of a. 
If there was not a path dependence (see Section [3.3. II) on the integrated volatihties tk^.t^, 
this transformation would greatly improve SMC methods. This is because one needs only 
estimate an integral w.r.t the marginal Vt„^- However, the problem is recast into a similar 
case as in Section I3.2.2J as a result the method detailed below, is restricted to the case 
where m is not large (m < 24). 

4.2 Simulations 

4.2.1 SMC Method 

The SMC approach is similar to the second method in Section 13.2.21 The difference is 
as follows. In prior simulations it was found the temperature k could not be too large 
(k < 0.5) to avoid a substantial weight degeneracy effect. This is when introducing the 
potential in the middle of the simulation. If the potential function was introduced very 
early, the path degeneracy effect occurred; despite the fact that the temperature is high, 
which can reduce the variance in estimation, the algorithm resampled too often to yield 
poor estimates. 

To allow the temperature parameter to reach 1, an SMC sampler was adopted. That 
is, once time m is reached the SMC sampler simulated from 



7rn(l^ti:t„,Wti:t„) OC 



m 



n '^'^H-i ^^^^ ' /^(^*-2 ^*-i ) ' ^i (^i^*« ) }P(^*« 



i=l 



for K < ki < ■ ■ ■ < kp = 1. Here k is the final temperature of the SIR algorithm. In all 
cases, p = 20 and the k's increase at a constant amount; see [12] for automatic schemes. 
The initial SIR is used for computational efficiency; it is less costly to run this algorithm 
than SMC samplers. Note that the estimate of the option price is as in ([6]). 

For SMC samplers, the particles are simulated according to an MCMC kernel, Kn-, of 
invariant distribution vr^; details are given in the appendix. The backward kernel used in 
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0. Use an SIR algorithm (as in Figure [2]) to sample from the sequence of densities: 



Trn{^h:t,,,Vt,:tJ OC 






-K 



K.n r n 



n '^''h^l {^t.; /^(^t.-2:t.-l), 5-i(fl:f J}p(ftJ 



i=l 



with n G Tm, {ft;n}i<n<m giveu (0 < Ki < ■ ■ ■ < Km < !)• The process densities are 
used as the proposals. 



1. Given the samples from Tin{i-'ti:tmi "^tr.tm) ^^e SMC samplers (as in Section [3.2. II) to 
sample from 



T^n{l^tr.t^.Vt^.,tJ OC 



^t„ 



m 



-K 



\\vut^_,{j^u-,Kj^u_2-.u_^).^iivi-.u)}vM 



j=i 



with n G Tp, k^ = hi < ki < ■ ■ ■ < kp = 1. Note that this is a sequence of densities 
on the same space, as opposed to the sequence in step 0. which samples from one of 
increasing dimension. The SMC sampler uses the MCMC algorithm in the appendix 
and has incremental weight (fT6|l . 



Figure 4: SMC Method to Approximate Arithmetic Asian Options. 



the SMC sampler (see Section [3.2. ip . is 



J-'n—l\-^n) ■^n—lj 



'^ny-^n—lj-'^ny-^n—ly •^n) 



where x„ = {i^t„jVt„), is such that the incremental weights at time n are 



(16) 



WJXn-l] 



^nl-^n— 1 J 



'ITn-l[Xn-l) 

The choice of backward kernel is quite popular in the literature and, in our experience, 
often works well in practice. The procedure is summarized in Figure HI 



4.2.2 Importance Sampling Method 

To compare the method above, the approach of [M] is modified. The main difference is to 
simulate the integrated volatility exactly, and to use the smallest value (over each interval) 
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in the numerical search for a change of drift. In this scenario, it was found that there was 
not always a unique root; this is in less than 1% of the cases in the results below. 

4.2.3 Numerical Results 

The algorithms are run 50 times, for similar CPU time; this was less than 150 seconds, 
m = 12, A = 1, ti — tj_i = 1 (Vz G Tm), /i = 0.07, z/ = 0.5; these values, when relevant, are 
consistent with the parameter estimates that were reported in [39]. The SMC method was 
run with 10000 samples, and in the SIR part of the algorithm the potential was introduced 
at time 6, with n = 0.2; this increased by 0.035 at each time step, until time 12 was 
reached. The process densities were used for the proposal densities. The MCMC moves, 
in the appendix had acceptance rates between 0.3-0.5 and this was quite a reasonable 
performance. The IS method was run with only 4000 samples. The higher CPU time, for 
IS, is due to the fact that a bisection method is run for each sample (which was allowed a 
maximum of 300 steps). 

The variance reduction factors, across the multiple runs, for 5 different settings of 5*0 and 
K can be observed in Table [21 Note that whilst the option can be standardized for 5*0, our 
aim is simply to show that the conclusions of this experiment are relatively invariant to the 
parameters used. In the Table, a substantial improvement for the same CPU times can be 
observed. It should be noted, however, for larger u (in the latent process) both algorithms 
do not work well. The latter parameter will increase the variability of the volatility process, 
and it is not clear how this can be dealt with. Note the results here should not be contrasted 
with (slightly disappointing) those in Section 13. 3. 3 j this technique differs substantially from 
the ideas there. 

5 Summary 

In this article we have provided a summary of SMC methods and shown the potential 
benefits for their application in option pricing and sensitivity estimation. Many of the 
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Settings 


Variance Reduction 


5o = l, A' = 0.1 


1.09x10" 


So = l,K = 0.9 


3.70x10" 


So = 5, K = 1 


4.37x10" 


Sq = 50, K = 10 


4.37x10" 


So = 50, K = 49 


1.67x10^2 



Table 2: Variance Reduction of an SMC Metliod against Importance Sampling. This is for 
an arithmetic Asian option, under the BNS SV model. The algorithms were both run 50 
times. 

examples presented in this paper have been based upon translations of algorithms that 
have been used in other contexts such as stochastic control (an exception is Section H]). On 
the basis of the work here, it is felt that the application of SMC methods focussed upon 
particular problems, is likely to be highly beneficial. 

The reader of this article should be cautious. There is no claim that SMC methods will 
always work well, and indeed be uniformly superior to other Monte Carlo methods, or even 
deterministic approaches. In the process of working on the article, many financial models 
were too complex to apply standard SMC methods. What is claimed, at least, is that the 
methods can help to push the boundaries of models that may be used for pricing and is a 
useful technique for any Monte Carlo specialist in option pricing. 
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Appendix 

In this appendix the proof of Proposition [1] as well as the model and MCMC method that 
was used in Section H] are given. 



Proposition [T] 

Proof. A more general proof, by induction, is given. Let m = 2, and / G Bh{E2) (the 
bounded measurable functions on E2), then: 
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application of the Fubini theorem gives 
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^{StJ^{St2)f{x2)-^{pe{x2\xi)pg{xi\xo)}dXi;2 
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Now assume that the identity holds ioi m = n and consider n + 1, / G Bb{En+i)'- 
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setting / = 1 completes the proof. 



D 



Model 

Consider a finite collection of time-points = to < ^^i < ■ ■ ■ < ^^m = ^; then the joint 
density of the log-price and integrated volatilities are: 



W<P{yu.yu_^+Ku-ti-i),ai{vty,t;)}p{vt^] 



i=l 

where (j){-]fi,a) is the normal density of mean /i and variance a and, with i G T^, vt^ = 
(a^.„,,r}.„,,nj) G [0, Az/(tj — tj_i)]"' x [0,1]"* x N is a 2-dimensional Poisson process on 
[0, \iy(ti — ti-i)] X [0, 1] of unit rate. The functions (Ji(-) are as follows. Let (Tq = vq and 
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• 1 \ '^i 



then 



MCMC 

The moves are based upon Metropohs-Hastings kernels; see [59] for an introduction. There 
are two main quantities to be updated; the I'ti-.tm ^^'^ the parameters 

that make up the volatihty process. From here-in simphfy the notation to i^i-.m- 

The Ui-m are updated, by picking an i G T^, uniformly at random, and sampling from 
its process density. The move is accepted or rejected, according to a standard Metropolis- 
Hastings acceptance probability: 

with 1 < i < m — 2] a similar formula can be calculated if z G {m — 1, m}. 

The {cii;ni^^i:niy^iy ■ ■ ■ y^v.n„^y^i^n^^^rn) are Updated in a similar way to [39]. Again, 
pick an z G T^ uniformly at random. Then, the number of points, rij is either increased 
by 1 (birth), or decreased, if possible, by 1 (death); the choice is made at random. If a 
birth occurs, the new aj,rj are sampled according to the process density. The Metropolis- 
Hastings acceptance probability, for a birth is 

^ ,^ UT=i ^'^.-A^f^ K'^j~2:j^l), ^j{Vl:i, , V'r.j)} ^ Xujtj - ti^i)d{rii + 1) 
]\7=i'P>^j-A^rA^^{^j^2:j-l),&j{yi.,j)} b{ni) 
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where b{-), d{-) are the probabihties of proposing birth and death moves. The death move, 
when Hi = rii + 1, has a ratio that is the inverse of that above. 
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